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This report is a repository for the results ootained from a large scale empirical -c.myai ison oi 'wcii itera- 
tive and evolution-based optimization heuristics. Twenty -seven static optimization probl?* ,« -canning 
six sets of problem classes which are commonly explored in genetic algorithm liter HUire, ,ii examined. 
The problem sets include job-shop scheduling, traveling salesman knapsack binpaciunv . ivturai netwoi V. 
weight optimization, and standard numerical optimization. Tlie search spaces in these p -obi im v range 
from 2 to 2 2tM ®. The results indicate that using genetic algorithms for the optimization of st. ti * functions 
does not yield a benefit, in terms of the final answer obtained over simpler optimizatio- - heuristics. 
Descriptions of the algorithms tested and the encodings of the problems are described ;\ detiii tor repro- 
ducibility. 
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1. INTRODUCTION 

Genetic algorithms (GAs) and other evolutionary procedures are commoniy used for static function opti- 
mization. Although there has been growing evidence that methods such as GAs are, in general, not well 
suited in this domain [De Jong. 1992], a large amount of research has been devoted to improving their 
effectiveness for function optimization. Hybrid mechanisms, ranging from alternate evolutionary meth- 
ods to specialized operators and representations which can intelligently use problem specific informa- 
tion, have achieved good results in many specific applications. Nonetheless, relatively few of these 
techniques work well across a wide range of problems. 

The aim of this paper is to compare two standard genetic algorithms with simpler methods of optimiza- 
tion: multiple-restart stochastic hilklimbing (MRSH) and population-based incremental learning (PBIL). 
Previous comparisons between forms of MRSH and GAs can be found in [Acklev. 1994], [Juels & Wattcn- 
berg, 1994], [Forrest & Mitchell, 1992], [Mitchell & Holland, 1994], and [Davis,'l991], to name a few. A 
comparison between GAs and PBIL has been made in [Bahija, 1994][Baluja & Caruana, 1995], This paper 
provides a large scale empirical comparison of these algorithms on problems commonly found in GA lit- 
erature. Three variants of MRSH, two variants of PBIL, and two GAs are compared. 

1.1 The Aims of this Paper 

This study aims at answering only one question: “How effective are standard GAs for optimizing static 
functions, given a set number of function evaluations, in comparison to other, simpler, algorithms?" This 
papa- presents results on many large problems; the size and quantity of the problems makes it hard to 
give in-depth analysis of the results beyond the algorithms' relative performances. A more in-depth anal- 
ysis of PBIL in comparison to standard GAs on a problem which was specifically designed to be easy for 
the genetic algorithm (and easier to analyze than the problems explored here) is provided in [Baluja & 
Caruana, 1995]. This paper does not attempt to address die problem of whether the classes of problems 
investigated are suited for evolutionary' or iterative function optimization. The focus of this paper is on 
comparing seven static function optimization methods on problems which are representative of problems 
commonly used as benchmarks in GA literature. No problem specific features have been added to any of 
the algorithms; all of the mechanisms used in the algorithms are "standard", and have been explored and 
described in the applicable literature. The inclusion of problem specific mechanisms or more sophisti- 
cated features has the potential to improve the performance of all the algorithms. 

There are two major concerns with performing a purely empirical comparison of these algorithms. The 
first is that each of these algorithms is defined by control parameters, and it is prohibitively expensive, in 
practice, to thoroughly explore the space of tire parameters while providing breadth in the types and 
sizes of problems attempted. The GA parameters used here were chosen to work well on many of the 
problems, but are not biased to any particular single problem. The parameters for the other algorithms 
were chosen in the same manner. In addition, GAs were selected to perform well on the task of optimiza- 
tion; they use mechanisms, such as elitist selection and scaling of fitness values, which are often used for 
tire optimization of static functions [De Jong, 1992]. One of the goals of this study is to use the algorithms with 
as little problem-specific knowledge as possible. The only problem-specific knowledge used in these algo- 
rithms is the number of bits in the solution encoding for each of the problems. 

The second concern is that there are many criteria by which the effectiveness of each algorithm can be 
measured. As mentioned before, there has recently been some controversy in the GA community as to 
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whether GAs should be used for static function optimization. One of the reasons for this controversy is 
that GAs "attempt to maximize the cumulative payoff of a sequence of trials" [De Jong, 1992] ratheT than 
attempt to find the single best optimum. Therefore, using the "best answer found" criteria may not be the 
best way to measure the GA's abilities. Nonetheless, a considerable amount of effort has been devoted to 
making the GAs better in function optimization. "Better" has usually been measured in terms of the best 
solution found in a given number of trials. The common forms of measurement for function optimization 
are on-line and off-line performance. On-line performance measures the average of all function evalua- 
tions up to and including the current evaluations. Off-line performance is a running average of the best 
performance values to a particular time. Other measurements include the best solution found in the final 
generation and the best solution found in any generation through the search. Although all these mea- 
sures reveal different insights into the search algorithm's ability, the measure we are interested in this 
study is the best solution ever found through the search The issues of cumulative payoff, on-line and off- 
line performance are not addressed here. The effectiveness of each algorithm is based solely upon the best 
answer it can find in die given number of trials. 

It is important to understand the scope of these results. All of the empirical comparisons ant* based upon 
static function optimization problems. The performance of each method is judged solely by the best solu- 
tion found during the run, given a p re-specified number of total evaluations. There' re, the following 
classes of problems are not considered here, and should be explored in file future: 


• Noise in the evaluation function [Grefenstette k Fitzpatrick, 1988). 

• A changing, or time-varying, evaluation function (over the period of a single run ) {Cobb, 1993] . 

• Problems in which queries have an associated cost, which must also be minimized {Cohn. 
1994]. 

• Problems in which multiple "solution vectors" must interact [l angton, 1994]. 

• Problems in which cumulative payoff is *o be upuiiiiZcu [MoHand, 1975], [Gou/u^, 1989]. 

• Problems which use variable-length encodings, or encodings with change over time {Koza, 
1992]. 


Although the above domains are not addressed here, the domain which is concentrated upon covers a 
wide variety of problems. A large portion of GA rest rch has been devoted to the types of problems ana- 
lyzed in this paper. The field of Operations Research is another source of many similar problems. 

The next section describes the simplest algorithm tested, multiple-restart stochastic hillclimbing. This sec- 
tion ir followed by descriptions of genetic algorithms, in section 3, and population-based incremental 
learning in section 4. In section 5, the problems attempted and the results obtained are described together. 
Section 6 summarizes the empirical results. Section 7 concludes the report and suggests some areas for 
future studies. 
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2. MULTIPLE-RESTART STOCHASTIC HILLCLIMBING 

Multiple-restart stochastic hilklimbing (MRSH) is a method of iterative optimization of static functions. It 
is the simplest of the optimization procedures explored in this paper. [Wallenberg and Juels, 1994] have 
compared one version of stochastic hilklimbing with GAs on several problems commonly used for gaug- 
ing genetk algorithms and genetic programming, and have achieved very promising results. The basic 
stochastic hillclimbing algorithm is .shown in Figure 1. 


V 4- fUKkiiniv generate solution vector 
Best 4- evaluate (V) 


looptrTERATIONS 

N 4- Fl»p_Random_Bit (V) 
if (evaluate (N i > Best) 

Best 4— evaluaicrV j 
V N 

Flip_RamJom_Bil is a function which returns a solution string with or»I> c 

>nc t*it Ci&iigcu uufii it> input xrluuOu >injj^. 


Figure 1: The stochastic hillclimbing algorithm for binary solution vectors. In the full algorithm, 
the best vector along with its evaluation would be saved. In practice the algorithm could be 
restarted in random locations many times - and the best solution ever found returned. 


Three variants of this algorithm are explored in this paper The first variant, (MRSH-1) maintains a list of 
the position of the bit flips which were attempted without improvement These bit flips are not attempted 
again until a better solution is found. When a better solution is found, die list is emptied, if the list 
becomes as large as the solution encoding, then no single bit flip can improve the solution. In this case, 
MRSH-1 is restarted at a random location with an empty list. 

The second and third variants of stochastic hilklimbing, (MRSH-2 & MRSH-3), allow moves to regions of 
higher and equal evaluation. This is different than MRSH-1, which only allows moves to regions of higher 
evaluation. MRSH-2 & 3 differ from each other in the number of evaluations allowed before restarting 
search in a random location. In MRSH-2, the number of evaluations is dependent upon the length of the 
encoded solution. MRSH-2 allows l(P(length of solution) evaluations without improvement before search is 
restarted. When a solution with a higher evaluation is found, the count is reset. MRSH-3 enforces a much 
stricter police of restart; after the total number of iterations is specified, restart is forced 5 times during 
search, at equally spaced intervals. 


3. GENETIC ALGORITHMS 

Genetic algorithms (GAs) are biologically motivated adaptive systems which are ba>ed upon the princi- 
ples of natural selection and genetic recombination. A GA combines the principles of survival of the fit* 
test with a randomized information exchange. It has the ability to recognize trends toward optimal 
solutions, and to exploit such information by guiding the search toward them. 

In the standard GA, candidate solutions are encoded as fixed length vectors. The initial group of potential 
solutions is chosen randomly. These candidate solutions, called "chromosomes/' are allowed to evolve 
over a number of generations. At each generation, the fitness of each chromosome is calculated; this is a 
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measure of how well the chromosome optimizes the objective function. The subsequent generation is cre- 
ated through a process of selection, recombination, and mutation. The chromosomes are probabilistically 
selected for recombination based upon their fitness. General recombination (crossover) operators merge 
the information contained within pairs of selected "parents" by placing random subsets of the informa- 
tion from both parents into their respective positions in a member of the subsequent generation. 
Although die chromosomes with high fitness values have a higher probability of selection for recombina- 
tion than those with low fitness values, they are not guaranteed to appear in the next generation. Due to 
the random factors involved in producing "children" chromosomes, the children may, or may not, have 
higher fitness values than their parents. Nevertheless, because of the selective pressure applied through a 
number of generations, the overall trend is towards higher fitness chromosomes. Mutations are used to 
help preserve diversity in the population. Mutations introduce random changes into the chromosomes. A 
good overview of GAs can be found in [Goldberg, 1989] [De Jong, 1975]. 

Two variants ot the traditional genetic algorithm are tested in this study. The first, SGA, has the following 
parameters: Two-Point crossover, with a Crossover Rate of 100%, Mutation Rate: 0.001, Population Size: 
100, Elitist selection (the best chromosome in generation N replaces the worst chromosome in generation 
N+l). The second GA used, termed GA-Scaie, uses the same parameters, with the following exceptions: 
Uniform crossover with a crossover rate of 80%, and the fitness of the worst member in a generation is 
subtracted from the fitnesses of each member of the generation before the probabilities of selection are 
determined. Both GAs are generational, and both employ the elitist selection mechanism described 
above 


4 POPULATION-BASED INCREMENTAL LEARNING 

Population-based incremental learning (PBIL) is a combination of evolutionary’ optimization and hill- 
climbing [Behija, 1994]. The object of the algorithm is to create a real valued probability vector which, 
when sampled, reveals high quality solution vectors with high probability. For example; if a good solu- 
tion to a problem can be encoded as a string of alternating 0's and l's. a suitable final probability vector 
would be 0.01, 0.99, 0.01, 0.99, etc. 

Initially; the values of the probability vector are set to 0.5. Sampling from this vector yields random solu- 
tion vectors because the probability of generating a 1 or 0 is equal. As search progresses, the values in the 
probability vector gradually shift to represent high evaluation solution vectors. This is accomplished as 
follows: A number of solution vectors are generated based upon the probabilities specified in the proba- 
bility vector. The probability* vector is pushed towards the generated solution vectorfs) with the highest 
evaluation. The distance the probability vector is pushed depends upon the learning rate parameter 
After the probability vector is updated, a new set of solution vectors is produced by sampling from the 
updated probability vector, and the cycle is continued. As the search progresses, entries in the probability 
vector move away from their initial settings of 0.5 towards either 0.0 or 1 .0. The probability vector can be 
view’ed as a prototype vector for generating solution vectors which have high evaluations with respect to 
the available knowledge of the search space. 

This algorithm is an extension of the Equilibrium Genetic Algorithm developed in conjunction with 
[fuels, 1993, 1994]. Another algorithm related to FGA/PBII is Bit-Based Simulated Crossover (BSC) 
[Syswerda, 1992}[Eshelman & Schaffer, 1993). BSC regenerates the probability vector at each generation; 
it also uses selection probabilities (as do standard GAs) to generate the probability vector. In contrast, 
PBIL does not regenerate the probability vector at each generation, rather, the probability' vector is 
updated through the search procedure. Additionally, PBIL does not use selection probabilities. Instead, it 
updates the probability vector using a few (in these experiments 1 ) of the best performing individuals. 
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The manner in which the updates to the probability vector occur is similar to the weight update rule in 
supervised competitive learning networks, or the update rules used in Learning Vector Quantization 
(LVQ) [Hertz, Krogh A Palmer, 1993). Many of the heuristics used to make learning more effective in 
supervised competitive learning networks (or LVQ), or to increase the speed of learning, can be used with 
the PB1L algorithm. This relationship is discussed in greater detail in (Baluja, 1994]. 

4.1 PBIL's Relation to Genetic Algorithms 

One key feature of the early portions of genetic optimization is the parallelism in the search; many diverse 
points are represented in the population of early generations. As the search progresses, the population of 
the GA tends to converge around a good solution vector in the function space (the respective bit positions 
in the majority of the solution strings converge to the same value). PBIL attempts to create a probability 
vector that is a prototype for high evaluation vectors for the function space being explored. As search 
progresses in PBIL, the values in the probability vector move away from 0.5, towards either 0.0 or 10. 
Analogously to genetic search, PBIL converges from initial diversity to a single point where the probabil- 
ities are close to either 0.0 or 1 .0. At this point, there is a high degree of similarity in the ctors generated . 

Because PBIL uses a single probability’ vector, it may seem to have less expressive pov. r than a GA using 
a full population that can represent a large number of points simultaneously- For example, in Figure 2, the 
vector representations for populations #1 and #2 are the same although the members of the two popula- 
tions are quite different. This appears to be a fundamental limitation of PBIL; a GA would not treat these 
two populations the same A traditional single population GA, however, would not be able to maintain 
either of these populations. Because of sampling errors, the population will converge to one point; it will 
not be able to maintain multiple dissimilar points. This phenomenon is summarized below: 


"... the theorem [Fundamental Theorem of Genetic Algorithms [Goldberg, 1989]], 
assumes an infinitely large population size. In a finite size population, even when there is 
no selective advantage for either of two competing alternatives... the population will 
converge to one alternative or the other in finite time (De Jong, 1975; [Goldberg A Seg- 
rest, 1987)). This problem of finite populations is so important that geneticists have given 
it a special name, genetic drift Stochastic errors tend to accumulate, ultimately causing 
the population to converge to one alternative or another" (Goldberg A Richardson, 1987). 


Similarly, PBIL will converge to a probability vector that represents one of the two solutions in each of the 
populations in Figure 2; the probability' vector can only represent one of the dissimilar points. 


In addition to moving the prototype vector towards the highest evaluation vector, the prototype vector 
can also be moved away from the lowest evaluation vector generated in each generation. However, as the 
prototype vector becomes fixed towards either 0.0 or 1.0 for each bit position, the hamming distance 
between the best and worst generated vectors will diminish- If the hamming distance between the best 
and worst vector is small, moving away from the worst vector is counter-productive, because it also 
moves away from toe best vector in many of the bit positions. Instead, the probability’ vector can be 
moved away from the values in the worst vector which differ from those in the respective positions of the 
best vector: The full algorithm is shown in Figure 3. 

In this study, two variants of toe algorithm shown in Figure 3 are used. The first, PBIL, uses the following 
parameters: Mutation Probability: 0.02, Mutation Shift: 0.05, Teamu g Rate: 0.1, and Negative 1 .earning 
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Population #3 

OOli 

1100 

1100 

0011 

Representation 

0.5, 0.5, 0.5, 0.5 


Population #2 

1010 
01 Cl 
1010 
0101 

Representation 

0.50.5 0.5 0.5 


Figure 2: The probability representation of 2 small populations of 4-bit solution 
vectors; population size is 4. Notice that the representations for both populations 
are the same, although the solution vectors represented are entirely different. 

Kate: 0.075. The second algorithm, the PP1L/EGA algorithm, uses the same parameters with the Negative 
Learning Rate set to 0.0. 


5. AN EMPIRICAL COMPARISON 

In this section, the algorithms described previously are applied to six classes of problems: Traveling Sales- 
man, jobshop scheduling, knapsack, bin packing, neural network weight optimization, and numerical 
function optimization. The results obtained in this study should not be considered to be state-of-the-art. 
The problem encodings were chosen to be easily reproducible, and to allow easy and lair comparison 
with other studies Alternate encodings may yield superior results. In addition, no problem-specific 
information was used lor any of the algorithms. In the cases in which problem-specific information is 
available, it may be able to help all of die search algorithms presented in this study. 

In the problems presented in this paper, all of the variables were encoded either with Gray-code or stan- 
dard base-2 representation, as indicated with the problem. The variables were represented in non-over- 
lapping, contiguous positions within the chromosome (solution encoding). The results reported are the 
best evaluations found through the search of each algorithm, averaged over at least 20 independent runs 
per algorithm per probleir.. In the problems in which random values are assigned to problem attributes 
(such as the location of cities in die Traveling Salesman Problems or sizes of elements in the bin packing 
and knapsack problems), the values are consistent across all algorithms attempted and across all 20 trials 
for each algorithm. 

All algorithms were allowed an equal number of evaluations per run (200,000). In each run, the CiA and 
PBIL algorithms both were allowed 2000 generations, with 100 function evaluations per generation. In 
each run, the MRSH algorithms were restarted in random locations as many times as needed until 
200,000 evaluations were performed. The best answer ever found in the 200,000 evaluations was returned 
as the best answer found in the run. The final results for the problems are given in tables following the 
description of the problems. The best results are highlighted. 

5.1 Traveling Salesman Problems (TSP) 

The TSP problem is probably the most famous of the NP-complete problems. Given N cities, the object is 
to find a minimum length tour which visits each city exactly once. The encoding used in this study 
requires a bit string of size Nlog^N bits. Each city Is assigned a substring of length log^N which is inter- 
preted as an integer. The city with the lowest integer value comes first in the tour, the city with the second 
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**"■* initiative Probability Vector 

tor i =1 to LENGTH do P[\) = 0.5; 

white (NOT teTnination condition) 

Generate Sampies ***** 
for i :«f to SAMPLES do 

sample vectors'] :» generate.safnpio^vectof.according.to^probabilities (P); 
evaluations^ :-Eva’jate Solution (samptefi]}; 

best v9C f of :» find_vectof_with_besLeva?uation (sample_vectors, evaluations); 
wofst_vectOi fird_vector_wth worst evaluation (sample^ vectors, evaluations}; 

***** Update Prob&bi’ i ty Vector towards best solution ***** 
tor i ;=1 to LENGTH dc 

PK Pfl] * (1.0 - LR) ♦ bes^vectorfi! * pjR }; 


Update Probability Away from Worst SotUmOn ***** 
for i :=1 to LENGTH do 

if (best_uectoi(q * worst.vectorfi]) then 

pp] :* Pfl] * (1.0 - NEGATIVES R) + besL-vtctCfiTj * {NcGATiVE_LR)i 


Mutate Probability Vector ***** 
for i :=1 to LENGTH do 

if (random (C.1 ) < MUT^PROBABfLfTY) then 

if (random (0,1 ) > 0.5) then mutate_cfirecticr. ;= 1 
else mutate.directon :« 0; 

P[»J :* Pfil * (1.0 - MUT.SHIFT) + mutate_direction * (MUT_SHIFT); 


VSER DEFINED CONSTANTS (Vahics Used in this Study ): 

SAMPLES: the number of vectors generated before update of the probabif ty vector (100). 

LR: the learning rate, how last to exploit the seexh performed (0.1). 

NEGATIVE^ LR: Fie negative teaming rate, how much to team from negative examples (PBiL = 0.075, EGA * 0 0). 
LENGTH: the number of bits in a generated vector (proWem specific) 

Vl/T_PRO0ABILfTY; the pnobabftty for a mutation occurtor in each position (0 02) 

VUT SHIFT: the amount a mutators afters the vaue *n the bit position (0.05) 

Figure 3: The PBTT ./FGA algorithm for a binary alphabet. 


lowest comes second, etc . In the case of ties , the city whose substring comes first in the bit string comes 
first in the tom. This encoding was taken from [Syswerda, 1992). To minimize the tour length, the evalua- 
tion used is 1.0/Tour_Lengtk Four problems were attempted: the first contained 128 cities, the second 
contained 200 cities, and the third and fourth contained 255 cities. The integer encoding of the fourth 
problem used Grav*code # while the rest used standard binary code. The results for these four problems 
are shown in Table L The distances between cities were generated randomly for each problem. 
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Table I: Traveling Salesman Problem - Average Final lour Length 
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5-2 Jobshop Scheduling Problems 

Recently, genetic algorithms have been applied \o the jobshop Mheduling problem because it is diihcuit 
for conventional search ba>ed method> to find near-optimal solution* in a reasonable amount of time 
[hang et t#k. 1993J. A good description of the jobshop problem is given bv Fang: 


"In the general job shop problem, there are i jobs and m machines: each job comprises a 
set ot tasks which must each be done on a different machine for different specified pro- 
cessing times, in a given job-dependent order. ... A legal schedule is a schedule of job 
sequences on each machine such that each job's task order is preserved, a machine is not 
processing two different jobs at once, and different tasks of the same job are not simulta- 
neously being processed on different machines. The problem is to minimise the total 
elapsed time between the beginning of the first ia*k and the completion of the last task 
(the makespan)" [Fangc f a/., 1993J. 

The problem is encoded in two ways. The first encoding is derived from [Fang nl, 1993]. Ihe exact 
encoding can be round in [Fang ct nL. 1993] and [Baluja. 1994] Ihe difference between this encoding and 
that used by Fang is that in this study, bit =*tring* were used to encode the integers (in standard binary 
encoding) in the range of 1..J. Fang used chunks which are atomic for the GA. Although the encoding 
used herr makes the problem difficult for these optimization tecliniques, it is used to provide results 
which are comparable to other algorithms. A > U*r makespan i* to be minimized, the evaluation of the 
potential solution is (1.0/ makespan t. Two standard test problems arc attempted, a 10 job, 10 machine 
probiem and a 20-job, 5-machinc problem. A description of the problems can be found in jMuth & 
Thompson, 1%3]. The results are shown in Iabie II. 


Tabic II: Jobshop Scheduling • Minimum Makespan - Using First Encoding. 


PROBLEM 

j MRSHl 

, 

MKSH2 j MKSH3 

j 1 

-4 - - | 

EGA 1 

J 
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PBIL 

SGA 
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- H- — 
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j lotahcp 20v5 

jp.-os- 

j rn* | 
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- 1 " ~ ~~ i 
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The second encoding n* similar to the encoding used :n the Traveling Salesman Problem. ihe draw- 
back of this encoding n it it uses more bits than tne prev iou* one. Nonetheless, empirically, it revealed 

unproved results. Each joo :s assigned M entrie s of size* log^J^Mi hits. The total length of the encoding is 
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l*M-|ofc(J\\l!. Therefore, each of these problems t« encoded in n 700 bit solution vector. The value ot each 
entry {of length log.LJ'A-f)) determines the order in which the jobs are scheduled. The job which contain.' 
the smallest valued entry is scheduled first, etc. Jhe order in which the machines are selected for each job 
depends upon the ordering required by the problem specification The results are shown in Table 111. With 
this encoding, six out of the seven algorithms perform Letter than, or at least as well as, the first job hi op 
encoding presented (the performance of MRSH-1 does not improve with tins encoding). 


Table III: Jobshop Scheduling - Minimum Makespan - 1 sing Second Encoding 
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5.3 Knapsack Problem 

In the knapsack problem, there is a single bin of limited capacity, and M elements of varying sizes and 
values. The problem is to select the elements which will yield the greatest summed value without exceed- 
ing the capacity or th** bin. The e\ aluation of the quality ot the solution is judged in two ways: If the solu- 
tion selects too many elements, such that the summed size of the elements is too large, the solution is 
judged by how murh it exceeds the capacity of the bin - the less it exceeds the capacity, the better the solu- 
tion. If the sum oi the element size* is within the capacity of the bin, the sum of the values of the selected 
elements is used as the evaluation. To ensure that the solutions which overfill the bin are not competitive 
with those which do not, their evaluations are multiplied by a small constant. This makes the invalid 
solutions competitive only when there are no solutions in the population which are valid. The evalua- 
tions are described below. 
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The weights and value- lor each problem vv ere randomly generated In the !ii -t two problems having ^12 
and 2000 elements respectively, a unique element is represented by each bit. When a bit is >et to 1 , the cor- 
responding element is included. In the third and fourth problems, there are 1 Of • rmd 120 unique elements, 
respectively. However, there are 8 and 32 copies of each element; the number of element- of each type 
winch are included in the solution is determined by interpreting a bit string, length 3 (log^S) bit- and 5 
Uog 2 32> bits, into decimal, respectively. The results are given in Table IV. Note that the SGA algorithm 
was unable to find valid solutions in the second and fourth problems. 
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Table IV: Knapsack Problem - Average of Best Valves 
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5.4 Bin Packing 

In the bin packing problem, there are N bins or varying capacities and M elements or varying si 7 os. The 
problem is to pack the bins with elements as tightly as possible, without exceeding the maximum capac- 
ity of any bin. In the problems attempted here, tire error is measured by: 


ERROR = % CAP. ASSIGNED. CAF ' » the capacity J 

j _ j 1 1 ASSIGNED, is the total size of the element? in bin 

The solution is encoded in a bit string of length M * log : N. Cach element to be packi'd is asvgnod a 
sequential substring of length log-,N whose value indicates the bin m which the element is placed. In 
order to minimize the ERROR, the evaluation of the potential solution i» 1C/ERROK 

Four bin packing problems of various size? were tested: 32 bins, 12$ elements; 16 bins, 12$ elements: 1 
bins, 256 elements; and 2 bins, 512 elements Ail ot the problems generated were guaranteed to have a 
solution with 0.0 error. The results are shown below, in Table V. 


Table V: Bio Packing Problems - Minimum Error 
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5.s Evolving Weights for an Artificial Neural Network (ANN* 

Recently, evolutionary algorithms have been used to evolve the weights ot artificial neural networks. Li 
the experiments reported here, the weights ot two small u redefined network architectures were evolved 
In the first test, the obiect of the neural network w as to identify the parity of 7 inputs. Hie input? were 
either 0 (represented by -0.5) or ! (rrprewrted hy 0.5>. It the parity was I, the target output i< 0.5, if the 
parity was 0, the target output is -d.5. The evaluation was the sum of squares error on the 12$ training 
examples. A bias input (a unit who re input is set to 1.0) was also used: Lius has connections to the hidden 
and tire output units IHertz. Krogh & Faimer, 19V3|. The network architecture consisted ot S input units 


pa^e I * 


(including bias). 5 hidden units, and 1 output unit. The network was lullv connected between sequential 
layers. There were a total of 46 connections ir. the netu ork, the values of the weights were restricted to the 
range of -10.0 to +10.0. All hidden and output units used a sigmoid activation function. Weights were rep- 
resented as binary and gray codr, and were assigned S non- ^flapping bits m the solution string 


Figure 4: Network 
Architecture. 
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In the second two tests, eight real valued inputs were used- t he fu>t two inputs repre.-ent the coordinate* 
of a point within a square with upper left comer (L I C.) of (-1.0, 1 .0) and lower right comer (IRC) of <1.0, 
-1.0k The task w* s to determine whether the point fell into a square region between ULO-U75, 0,7?), and 
LRC (0.75,-0.75> and outside a smaller square with ULC (-0.35, 035), and LRC (0.35,-0.35). A diagram of 
this is shown i i figure 5. 5 inputs contained random noise in the region 1-1: -1]. This noise was deter- 
mined in the beginning of the* run, and remained the same, in each training example, throughout the run. 
The last input was a bias unit. Li total, the network had * inputs {including bias), 5 hidden units, and 1 
output; tliis created 46 connections. For training, 100 uniformly distributed examples were used. The 
same representation and scaling of weights was used as in the previous problem. In these two problems, 
weights were represented as binary and Gray code, respectively. Jhe results arc shown at Table VI. 
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T*ble VI: ANN Weight Optimization - Sum of Squares Error. 
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5.6 Numerical Function Optimization 

In this section, the seven algorithms arc compared or three numerical optimization problem? hi ti e hr-t 
and second problems the variables m the first portions or the solution string hm e a large influence on the 
quality of the* rest of the solution; small change:* in their values can cause large change* *r> the evahtarior. 
of the solution. In the third problem, each variable can be sot independently. Farit va liable, was repre- 
sented u^ing 9 bits, and was scaled uniformly into the range ±2 36 To avoid a division bv zero error, a 
small constant, C (- 0 . 00001 ), was added to the denominator t*f each function. Fach problem was tested 
with the variables represented in standard biraty and Gray rode Results are shown m lable VII. ihe 
maximization functions are: 
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6. SUMMARY OF EMPIRICAL RESULTS 
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Many results hfivc been present cd in the pr/viom section. This papei has concentrated on breadth; a 
kirge number of problems were attempted with se\en optimisation heuristics. It should be noted that 
because aii of the algorithms hav e tunab'e parameters, it is possible that different settings mav vield dif- 
ferent results. Additional mechanisms, which take advantage of problem specific jifurmation, may ai?u 
improve the performance ot each of tne?e method?, \onetheic*??, b\ selecting a variety of problems and 
problem sizes to compare, all of the algorithms should show thoir strengths and weaknesses In some per 
tion ot the test set. 

The relative ranks of the algorithms on all of the problem: are show n in Table VII; tins table rank? the 
algorithms with respect to the acerage best r -suits produced over all runs. It should bo noted that wit’ 
only 20 runs per algorithm, not all of t V differences arc statistically significant. More details on the diffci 
entiab between each algorithm's performance w ere piesented in Section 5. in term> of the linai >olutiou? 
found, the PBIL algorithm worked the best overall, followed by the FGA algorithm. In the majority of 
problems attempted here {23 out of 27), learning Tom negative examples improved the quality of the 
frncal solutions found (PBIL performed better than LG A). Only in two of the problems did the negative 
learning hurt the performance oi the PBIL algorithm (EGA performed better than TThl >. 

In terms of clock speed, the MRSH algorithms worked the lastest Howev er, it the time tor each chromo- 
some/ solution string's evaluation is much larger than the time for the algorithm's procedures, this bene- 
fit diminishes. Moves to equal regions (rather than only strictly better regions) had mixed results overall. 
Nonetheless, in jveral problems, such as the jobshop (both encodings) and TSP problems, the moves to 
equal regions improved performance. In other problems sets, not explored here, it was abo found that 
moves to regions of equal evaluation^ were important for good performance [fuels & Wattenberg, 139-1J. 
MKSH did well on the largest TSF examined; it was able to find a snorter tour than the other algorithms 
(when encoded in binary and Gray Code}. Similarly, in F3, MKSH wa> able to take the largest advantage 
of the gray code. 

Although the standard genetic algorithm (SGA) performed only well as the MRSH ?Jgorilhm>, the 
GA-scale algorithm performed slightly better A summary of the results can be found in lable VUl. This 
table has the following columns: 


1. In the cases in which PBIL did better than GA- Scale, this column gives the generation in which 
PBIL was able, on average, to surpass tin.- highest evaluation. GA-Staie found, on a\ t ragi , in its 
20 runs. Lor example, in the first problem: TSTM2S (binarvi, the highest evaluation of the GA 
was 2275.$ (Table I), by generation 210, PBIL was able to surpass this evaluation. 

2. The same number? are given lor GA-Scaie. i or example, on the kr.apsackOl {2000 eicm. i copy ) 
problem, the highest evaluation PBI1 was able to obtain was 403.7, in generation 150? CA 
Scale was, on average, able to surpass it. 
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3. Marks the problems on which any form of MKhH u as able to do hi tter than GA-Siah-. 
1. Marks when anv torn of MR^H did better than PBIL. 


Marks the problems in which GA-Sral? outperformed PHIL 


Co'hwn? o.J '.u:J *S ef //// ;iif+ 'nnt \\i*vrt! »i< 


n. Indicates the problems on widen moves to equal regions helped the re.-formancr of MRS! I. 

I hose *irc the problems in which either MRS! 1-2 or MRSII-o did better than MRSII-1 

7. Marks the problems in which learning fmm negative examples helped v he PHIL/ EGA algo- 
rithm. These am the problems m which PBIL performed belle: than EGA. 

S. Indicate* the problems in which GA -Scale did better than SC A The improvement max be due 
to the scaling ol fitness values and/or the different crossover (.operators iGA-btale: Uniform, 
SGA: Two Point). 

hor the problems which were attempted with gray ana binarv code, laoie I.\ provides a list of which 
algorithms benefited from using grav code. 


7. CONCLUSIONS 


This paper has* presented results on mam problems. From the resuits reported in this paper, it i* evident 
that algorithms which are simpler than standard GA* can perform comparably to GAs. or both small and 
largo problems. Other studies have also shown this for various sets ol problems [Ju el» & Wittenberg, 
1994][Milche)l 6z Forrest, 1992J, etc. In studies analyzing the performance of GAs on particular problems, 
these results suggest that analyses -hould include comparisons not only to other GA*, but a!«o to other 
simpler methods of optimization before a benefit is claimed in favor of GAs. This study did not include 
techniques such as Simulated Annealing or labu Search, which should be included in the future. 

it is interesting to note that the PBII algorithm, which does not use the crossover operator, and redefines 
the role of the population to ore which is very different than that of a GA, performs either better than or 
comparably tu a GA on the majority of the problems. PBFL and GAs both generate new trals based on 
statistics from a population of prior trials. The PBII algorithm cxpl^itly maintains these statistics, while 
the GA implicitly maintains them in its population The GA extracts the statistics by the selection and 
crossover operators. More detailed comparisons between these tw o algorithms car. be found in jBalira & 
Caruana, 1995J. 

It should be noted that the relativ e performance of GAs ir. <. omparison to PBII will improve as the popu- 
lation size of theCA increases IBahra &. Camara, 199^]. As the population size of the GA increase-, the 
GA wiil lie able to maintain more dissimilar points in As population, and will therefore be able to use 
crossover more effectively On the other hand, in its current implementation, the PBIL algorithm only use 
a few solution vectors for updating the probability vector regardless of the population size. Nonetheless, 
the large population size needed by a GA is not always feasible because of the need to balance the num- 
ber of generations required and the total number of evaluations possible. However, even when the 
resources to use largo populations are available, a large amount of empirical work has shown that using 
an "parallel island-model' GA may be more elective than a single pannuctic population The island* 
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model evolves multiple small population in parallel, with only a small amount of interaction between 
subpopulations. The parallel subpopulation model, in many cases, has outperformed the use of a -ingle 
lar^e population; see for example: (Davniur e!. al 1993] [Gordon & Whitley. 1993j[Balu;a, l^^JlWniMev Ar 
Starkweather 1990]. It these parallel subpopulatiors are used instead of a smglc la rye population. each 
subpopulation ^an be modeled with individual probability vectors, as m the PB!L aigontlun. 

A GA with different mechanism-, such as rnn-stationary mutation rates, local optimization heuristics, 
parallel -ubpopulntions, specialized crossover, or larger operating alphabets, may perform better than 
the GAs explored here. It should be noted, however, that all of these mechanisms, with the exception oi 
specialized crossover operators, can be used with PBIL with lew, it any. modifications. 

Another direction to expiore in the future is how these algorithm perform with alternate solution encod- 
ings; in this study only binary encodings were used. Although work has already been conducted in this 
area with GAs (a good introduction to this can be found in [Fshelman & Schaffer, 1992]}, how well will 
PBIL or MRSH perform with these alternate encodings? Finally, in this study optimization was only 
explored in static environments. Future research should also include search and optimization in dynamic 
environments, or environments which require maximization of cumulative payoff. The adaptive nature 
of GAs mav reveal a pronounced benefit in these more complex domains 
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Table Mil: Comparison of Methods 
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